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Abstract — We present an achievable rate for general 
deterministic relay networks, with broadcasting at the 
transmitters and interference at the receivers. In partic- 
ular we show that if the optimizing distribution for the 
information-theoretic cut-set bound is a product distri- 
bution, then we have a complete characterization of the 
achievable rates for such networks. For linear determinis- 
tic flnite-fleld models discussed in a companion paper [3], 
this is indeed the case, and we have a generaUzation of the 
celebrated max-flow min-cut theorem for such a network. 

I. Introduction 

Consider a network represented by a directed relay 
network Q = (V, £) where V are the vertices represent- 
ing the communication nodes in the relay network. The 
communication problem considered is unicast (or multi- 
cast with all destinations requesting the same message). 
Therefore a special node S € V is considered the source 
of the message and a special node D £ V is the intended 
destination. All other nodes in the network facilitate 
communication between S and D. In a wireline network, 
such as studied in [1], the edges £ of the network do not 
interact and are orthogonal communication channels. In 
this paper, transmissions are not necessarily orthogonal 
and signals sent by the nodes in V can in general broad- 
cast and also interfere with one another. In particular, 
for each vertex j € V of the network, there is only 
one transmitted signal Xj which is broadcast to the other 
nodes connected to this vertex. Moreover it has only one 
received signal yj which is a deterministic function of 
all the signals transmitted by the nodes connected to 
it. By connection we mean the nodes that have edges 
belonging to the set £. By deterministic we mean that 
Uj = gj{{xk}k£j\f ), where Mj is the input neighbors of 
node j. Therefore, we have deterministic broadcast and 
multiple access channels incorporated into the model to 
reflect physical layer effects. 

This approach is motivated by the development of 
the linear deterministic finite-field model for wireless 



channels [2], and its connection to Gaussian relay 
networks [3]. Historically, deterministic relay networks 
were perhaps first studied in [4], where a deterministic 
model with broadcast but no multiple access was studied 
(the so-called Aref's networks). For such a network, the 
unicast capacity was determined in [4] and its extension 
to multicast capacity when all receivers needed the same 
message was done in [9]. A three-node determinis- 
tic relay network capacity was characterized in [10], 
where both broadcast and multiple access were allowed. 
Network coding is information flow on a very special 
class of deterministic networks, where all the links 
are non-interfering and orthogonal. For such networks, 
the unicast capacity is given by the classical max-flow 
min-cut theorem of Ford-Fulkerson, and the multicast 
capacity has been determined in the seminal work [1]. 
More recently, the capacity of a class of erasure relay 
networks has been established where random erasures 
attempt to model the noise and collisions [12]. In 
all these cases, where the characterization exists, the 
information-theoretic cut-set was achievable. Recently, 
a relay network where the cut-set bound is not tight has 
been demonstrated in [5]. 

We first consider general deterministic functions to 
model the broadcast and multiple access channels. For 
such networks we show an achievability which is tight 
only for functions and networks where the independent 
input distribution optimizes the information-theoretic 
cut-set bound. For Aref's networks where there is no 
interference, this is indeed the case and our result is a 
generalization of his. For deterministic networks where 
there is interference but the deterministic functions are 
linear over a finite field, it turns out that the cut-set 
bound is also optimized by the product distribution. 
For this case, our result is a natural generalization of 
the celebrated max-flow min-cut theorem. These ideas 
are easily extended to the multicast case, where we 
want to simultaneously transmit one message from S 



to all destinations D in the set V. For the linear finite- 
field model, we characterize the multicast capacity, and 
therefore generalize the result in [1]. We will discuss this 
in more detail in the next section. 

II. Problem statement and main results 

A. General Deterministic network 

As stated in Section IH we consider a directed network 
Q = iy,£), where the received signal yj at node j € V 
is given by 



where we define the input neighbors Nj of j as the set of 
nodes whose transmissions affect j, and can be formally 
defined as Nj = {i : € £}. Note that this implies 
a deterministic multiple access channel for node j and 
a deterministic broadcast channel for the transmitting 
nodes. 

For any relay network, there is a natural information- 
theoretic cut-set bound [6], which upperbounds the re- 
liable transmission rate R. Applied to our model, we 
have: 



R < max min I{Yqc;Xq\Xqo 



(a) 



max min H{YQa\XQa 

P({^j}iev) f^CAu 



(2) 



where Ad = {il : S ^ Q,D ^ 17^} is all source- 
destination cuts (partitions) and (a) follows since we are 
dealing with deterministic networks. 

The following are our main results for general deter- 
ministic networks. 

Theorem 2.1: Given a general deterministic relay net- 
work (with broadcast and multiple access), we can 
achieve all rates i? up to, 

max min H{Yna\Xn'=) (3) 

n.evP(^Of^eAo 

This theorem easily extended to the multicast case, where 
we want to simultaneously transmit one message from 
S to all destinations in the set Z) € D: 

Theorem 2.2: Given a general deterministic relay net- 
work (with broadcast and multiple access), we can 
achieve all rates R from 5 multicasting to all destinations 
D up to, 

max min min H(YQc\Xna) (4) 

This achievability result in Theorem 12.11 extends the 
results in [9] where only deterministic broadcast network 
(with no interference) were considered. 



Note that when we compare ^ to the cut-set upper 
bound in ([2]), we see that the difference is in the maxi- 
mizing set i.e., we are only able to achieve independent 
(product) distributions whereas the cut-set optimization 
is over any arbitrary distribution. In particular, if the 
network and the deterministic functions are such that 
the cut-set is optimized by the product distribution, then 
we would have matching upper and lower bounds. This 
indeed happens when we consider the linear finite-field 
model discussed below. 



(1) B. Linear Finite-Field Deterministic network 



A special deterministic model which is motivated [3] 
by its close connection to the Gaussian model is the 
linear finite-field model, where the received signal G 
¥^ is a vector defined over a finite field Fp given by. 



(5) 



where the transmitting signals G F^, and the "chan- 
nel" matrices Gij £ F^^^. All the operations are done 
over the finite field Fp, and the network Q, implies that 
Gjj = 0,i ^ Mj reducing the sum in ^ from = |V| 
terms i.e., all transmitting nodes in the network, to just 
the input neighbors of j. 

If we look at the cut-set upper bound for general deter- 
ministic networks Q, it is easy to see in a special case 
of linear finite-field deterministic networks that all cut 
values are simultaneously optimized by independent and 
uniform distribution of {xi}jgv- Moreover the optimum 
value of each cut is logarithm of the size of the range 
space of the transfer matrix Gf^^n^ associated with that 
cut, i.e., the matrix relating the super- vector of all the 
inputs at the nodes in Q. to the super-vector of all the 
outputs in induced by (H)). This yields the following 
complete characterization as the corollaries of theorem 
OandIO 

Corollary 2.3: Given a linear finite-field relay net- 
work (with broadcast and multiple access), the capacity 
C of such a relay network is given by, 

C= min rank(Gn,nO logp. (6) 

Corollary 2.4: Given a linear finite-field relay net- 
work (with broadcast and multiple access), the multicast 
capacity C of such a relay network is given by. 



C = min min rank(Go oc) loff p. 



(7) 



For a single source-destination pair the result in 
Corollary 12.31 generalizes the classical max-flow min- 
cut theorem for wireline networks and for multicast, the 



result in Corollary 12.41 generalizes the network coding 
result in [1] where in both these earlier results, the 
communication links are orthogonal. Moreover, as we 
will see in the proof, the encoding functions at the relay 
nodes could be restricted to linear functions to obtain the 
result in Corollary 12.31 

C. Proof Strategy 

Theorem 12.11 is the main result of the paper and the 
rest of the paper is devoted to proving it. First we focus 
on networks that have a layered structure, i.e. all paths 
from the source to the destination have equal lengths. 
With this special structure we get a major simplification: 
a sequence of messages can each be encoded into a block 
of symbols and the blocks do not interact with each other 
as they pass through the relay nodes in the network. 
The proof of the result for layered network is similar in 
style to the random coding argument in [1]. We do this 
in sections |llll |IV] and |Vl first for the linear finite-field 
model in [III] and |IV] and then for the general deterministic 
model in|Vl Second, we extend the result to an arbitrary 
network by considering its time-expanded representation. 
The time-expanded network is layered and we can apply 
our result in the first step to it. To complete the proof 
of the result, we need to establish a connection between 
the cut values of the time-expanded network and those 
of the original network. We do this using sub-modularity 
properties of entropy in Section ivjfl 

III. Linear Model: An Example 

In this section we give the encoding scheme for the 
linear deterministic model of (|5]) in Section IIII-AI In 
Section IIII-BI we illustrate the proof techniques on a 
simple linear unicast relay network example. 

A. Encoding for linear deterministic model 

We have a single source S with message W G 
{1,2,..., 2^^^} which is encoded by the source S into 
a signal over KT transmission times (symbols), giving 
an overall transmission rate of R. Each relay operates 
over blocks of time T symbols, and uses a mapping 
/j^^ • "^J ~^ '^/^ ^'•^ received symbols from the previous 
block of T symbols to transmit signals in the next block. 
In particular, block k of T received symbols is denoted 
by y^^^ = {yf''~^^'^^^\ ■ ■ ■ ^yf^^} and the transmit 

'The concept of time-expanded representation is also used in [1], 
but the use there is to handle cycles. Our main use is to handle 
interaction between messages transmitted at different times, an issue 
that only arises when there is interference at nodes. 



symbols by . For the model we will use linear 
mappings /,(•), i.e., 

xf =FfVf-^ (8) 

(k) 

where is chosen uniformly randomly over all matri- 
ces in F^^''. Each relay does the encoding prescribed by 
(IHl). Given the knowledge of all the encoding functions 
Fj at the relays and signals received over K + \V\ — 2 
blocks, the decoder D ^ V, attempts to decode the 
message W sent by the source. 

B. Proof illustration 

In order to illustrate the proof ideas of Theorem ( 12.11 ) 
we examine the network shown in Figure IIII-BI We will 
analyze this network first for linear deterministic model 
and then we use the same example to illustrate the ideas 
for general deterministic functions in Section IV-BI 




The network given in Figure IIII-BI is an example of 
a layered network where the number of "hops" for each 
path from S to D is equal to 3 in this caseH. The key sim- 
plification that occurs for layered networks is that we can 
divide the message W into K parts (sub-messages), each 
taking values in Wk G {1, 2, . . . , 2™}, A; = 1, . . . , i^. By 
doing this in Figure IIIPBl we see that for example, nodes 
Ai,A2 are sending signals which pertain to the same 
sub-message Wk- Therefore, the "interfering" signals in 
node Bi are both about the same sub-message. This is a 
statement that holds in general for layered networks. For 
example in block number k = 3, the source is sending a 
signal about w^, ^i, A2 are sending signals that depend 
on W2 and Bi , B2 in turn are sending a signal to D which 
depends on wi. This message synchronization implies 
that we can focus our attention on the error probability of 
a single sub-message w = wi without loss of generality. 

^Note that in the equal path network we do not have "self- 
interference" since all path-lengths from 5 to _D in terms of "hops" 
are equal, though as we will see in the analysis that can easily 
be taken care of. However we do allow for self-interference in the 
model and we choose to handle such loops, and more generally cyclic 
networks, through time-expansion as will be seen in Section jv!] 



Now, since we have a deterministic network, the 
message w will be mistaken for another message w' is 
if the received signal (w) under w, is the same as 
that would have been received under w'. This leads to a 
notion of distinguishability , which is that messages w, w' 
are distinguishable at any node j if yj{w) ^ yj{w'). 

The probability of error at decoder D can be upper 
bounded using the union bound as. 



Putting these together, since all three would need to 
occur, we see that in ([TOl l. for the network in Figure 
IIII-BI we have. 



V < 



P. < 2 



RTj, 



'{z.-u;'} = 2«^p{yg)M=yg)(u/)}. 

(9) 

For the deterministic network, this event, is random only 
due to the randomness in the encoder map. Therefore, 
the probability of this event depends on the probability 
that we choose such an encoder map. Now, we can write, 

¥{w ~* w'} = 

P {Nodes in Q can distinguish w,w' and nodes in fi'^ cannot}(10) 



since the events that correspond to occurrence of the dis- 
tinguishability sets Q, £ Af) aie disjoint. Let us examine 
one term in the summation in (fTOl ). The distinguishability 
of w = wi from w' = w'^ for the nodes Ai,A2 are 
from signals y^^^ , y , for the nodes Bi , B2 are from 

signals y^^\y^] and for the receiver D it is y^j^\w). 
For notational simplicity we will drop the block numbers 
associated with the transmitted and received signals for 
this analysis. 

For the cut $7 = {S, Ai, Bi}, a necessary condi- 
tion for the distinguishability set to be this cut is that 
y^^l"') = yA^i^')^ along with ys^iw) = ys^iw') and 
Yni'^) = yD(^')- Since the source does a random linear 
mapping of the message onto ^s{w), the probability that 
YaS^) = YA.iw') is given by. 



P{(It ® Gs,A,)(xsM - xsiw')) = 0}=p 



f\\ -Trank(Gc 



(11) 



since the random mapping given in dSjl induces in- 
dependent uniformly distributed X5(w), X5(t(;'). Here, 
(g) is the Kronecker matrix product. Now, in order to 
analyze the probability that y^al^) = yB2(^')' we see 
that since y^^iw) = y^^(tf')> ^A2iw) = :>CA2iw'), i.e., 
the same signal is sent under both w,w'. Therefore, we 
get the probability of y^, (tt^) = yB2i'^') given that the 
distinguishability set is Q. = {S, Ai, Bi\, as, 

P { (It ® Ga, ,B2 ) (xAi {w) - XAi {w')) = 0}= p-^^i^A, .b^ ) . 

(12) 

Similarly we get, 

¥\yj^{w) = yQ(ui')|distinguisliability set fi} 

= P{(lT®Gi3i,D)(xi3,(u-) -Xi3i(«;')) = 0} 
= p-™nk{GBj,D)^ (13) 



^--Trank(Gs.A2)p-™nk(G, 
^-r{rank(Gs, )+rank(G ^ ^ , 



,)^-Trank(GBi,o) 



J+rank(Gf 



')\.14) 



Note that since in this example. 



GsA2 
Ga„B2 
Gb,,d 



the upper bound for V in flU) is exactly 2-™"k(G}f2,r!c) 
Therefore, by substituting this back into (ITOl ) and Q, we 
see that 

Pe < 2«^|Ao|p-^'^'°''eA,,rank(G,,.^c)^ ^^^^ 

which can be made as small as desired if ii < 
min^eAi, rank(Gn^f7e) logp, which is the result claimed 
in CoroUory 12.31 

These ideas motivate first focussing on layered net- 
works as done in Section |IVl The major simplification 
that we get in this case is that the signals associated 
with different messages do not get mixed in the network 
and hence we can only focus on one message. Note that 
another simplification in layered (equal path) networks 
is that for a given node j, it is enough to choose the 
same encoding function fj for each block k. 

Now the general result for layered networks are proved 
in two parts: first for linear deterministic model and then 
for general deterministic model. 

IV. Layered networks: linear deterministic 

MODEL 



In this section we prove main corollaries 12.31 and | 
for layered networks. In a layered network, for each 
node j we have a length Ij from the source and all 
the incoming signals to node j are from nodes i whose 
distance from the source are li = Ij — I. Therefore, as in 
the example network of Figure IIII-BI we see that there 
is message synchronization, i.e., all signals arriving at 
node j are encoding the same sub-message. 

Suppose message Wk is sent by the source in block k, 
then since each relay j operates only on block of lengths 
T, the signals received at block k at any relay pertain 
to only message Wf^^i where Ij is the path length from 
source to relay j. To explicitly indicate this we denote by 
yj'\wk~ij) S F^"^ as the received signal at block k at 
node j. We also denote the transmitted signal at block k 



as :x.^''\wk^i~i^) G F^^ which is obtained by randomly 

mapping yf~^\wk^i^i^) G F'^. 

Since we have a layered network, without loss of 
generality consider the message w = wi transmitted 
by the source at block fc = 1. At node j the signals 
pertaining to this message are received by the relays at 
block Ij. We analyze a -layer network, each layer is 
a MIMO sub-network. Therefore, as in the analysis of 
(fTOl) . we see that 

P {Nodes in can distinguish w,w' and nodes in Sl'^ cannot} (16) 



We define Gq^qc as the transfer matrix associated with 
the nodes in 17 to the nodes in il'^. Note that since we 
have a layered network this transfer matrix breaks up into 
block diagonal elements corresponding to each of the 
Id layers of the network. More precisely, we can create 
d = If) disjoint sub-networks of nodes corresponding to 
each layer of the network, with nodes at distance 

/ — 1 from S that are in 0, on one side and 7/(17) nodes 
at distance / from S that are in Q'^, on the other, for 
I = 1, ■ ■ ■ ,Id- 

Each node i G A (17) sees a signal related to w = wi in 
block li = l — l, and therefore waits to receive this block 
and then does a random mapping to xf'\w) G F^"'" The 
random mapping is done as in by choosing a random 
matrix Fj of size Tq x Tq and creating 



(w) 



(17) 



The received signals in the nodes j G 7/(17) are linear 
transformations of the transmitted signals from nodes 
Ti = {u : {u,v) G f,f G 7/(17)}. That is, its output 
depends not only on the transmitters in /?/, but also other 
transmitters at distance / — 1 from 5 that are part of 17^^. 
Since all the receivers in 7/ are at distance / from S, 
they form the receivers of the MIMO layer /, and we 
denote this vector received signal as z/(w), and this can 
be done for all layers / = 1, . . . , Note that as in the 
example network of Section ITlI-BI for all the transmitting 
nodes in T which cannot distinguish between w, w' the 
transmitted signal would be the same under both w and 
w' . Therefore, in order to calculate the probability that 
nodes in 7/ cannot distinguish between w^w' or that 
z/(tt;) — zi{w') = 0, we see that 

zi{w) -zi{w') = G/ [uiiw] -ui{w')] , l = l,...,d 

(18) 

where the transmitted signals from (5i, ... ,13d are 



clubbed togethei[^ and denoted by u/(if), / = 1, . . . ,d. 
Also, due to the time-invariant channel conditions we 
see that G/ = ® G/, where ® is the Kronecker 
product. Since we are trying to calculate the probability 
that zi{w) = zi{w'),l = 1, . . . ,d, and hence we need to 
find the probability that ui{w) — ui{w') lies in the null 
space of G/ for each I = 1, . . . ,d. 

Now, if the distinct signals yf'\w),yf'^\w') re- 
ceived at the nodes i £ Pi could be jointly uniformly 
and independently mapped to the transmitted signals 
ui{'w),ui{w'), then we could say that the probability of 

. . size of null space „, , ^, • 

this occurrence is -■ r — j— > — . Clearly this 

size of whole space 

given by. 



IS 



V lui{w) - Ui{w') eJV{Gi)\ =p-™ik(G,) ^p-Tra„k(G,)^ 

(19) 

However, even though the signals y^- (w) are uniformly 
randomly mapped individually at each node i G Pi, 
the overall map across all nodes in 0i i s also uniform, 
and hence the probability given in ( fl9l ) is the correct 
one. Since the events in each of the stages/clusters are 
independent, we get that 



'\^ui{w)-ui{w')eN'{Gi),l 



}=n^ 



ak(G,) 



= P 



-TEf=irank{G,) 



Therefore, we see that 

■p <p-TEf=irank{GO_ 



(20) 



Now the probability of mistaking w for w' at receiver 
D G V is therefore 

where we have used [A/jj < 2l^L Note that we have 
used the fact that since Gq^q<: was block diagonal, 
with blocks, G/(17), we see that X^fl^"* rank(G/(17)) = 
rank(Gn,r2<=)- If we declare an error if any receiver 
D £ V makes an error, we see that since we have 2^^ 
messages, from the union bound we can drive the error 
probability to zero if we have, 

R< min min rank(Gf7 qc) logp. (21) 

^Just as in the received signals, in clubbing together the transmitted 
signals into ui(w), we put together signals transmitted at the same 
time instant together. This can be done since we have broken the 
network into the clusters/stages with identical path lengths. 



Therefore for the layered (equal path) network with 
linear deterministic functions, since as seen in Section 
im the cut-set is also identical to the expression in (|2TI ). 
we have proved the following result. 

Theorem 4.1: Given a layered (equal path) linear 
finite-field relay network (with broadcast and multiple 
access), the multicast capacity C of such a relay network 
is given by, 

C = min min rank(Gn n^^) log p, (22) 
V. Layered networks: general deterministic 

MODEL 

In this section we prove main theorems 12.11 and 12.21 
for layered networks. We first generalize the encoding 
scheme to accommodate arbitrary deterministic functions 
of ([Hi in Section IV-AI We then illustrate the ingredients 
of the proof using the same example as in Section ITlI-B I 
Then we prove the result for layered networks in Section 

ED 

A. Encoding for general deterministic model 

We assume a clocked network as in Section IIII-AI 
Therefore, for such a clocked network, the deterministic 
model in ^ implies that 

yf =5i(rf^W,), t = i,2,...,T. 

We have a single source S with message W € 
{1,2,..., 2^^^} which is encoded by the source S into 
a signal over KT transmission times (symbols), giving 
an overall transmission rate of R. We will use strong 
(robust) typicality as defined in [11]. The notion of joint 
typicality is naturally extended from Definition 15.11 
Definition 5.1: We define x € T5 if 

Wxix) -p{x)\ < 6p{x), 

where z/^(x) = : xt = is the empirical 

frequency. 

Each relay operates over blocks of time T symbols, 
and uses a mapping /j*' : yj Xj its received symbols 
from the previous block of T symbols to transmit signals 
in the next block. In particular, block k of T received 
symbols is denoted by yf^ = {yK'^-i^^+i] , . . . , yl^"^!} 

(k) 

and the transmit symbols by . Choose some product 
distribution YlieV P^^i)- source S, map each of 

the indices in G {1,2, . . . , 2^^^} choose f^s''\w) 
onto a sequence uniformly drawn from Ts{Xs), which 
is the typical set of sequences in Xj. At any relay 

(k) T 

node j choose to map each typical sequence in yj 



i.e., Ts{Yj) onto typical set of transmit sequences i.e., 

Ts{Xj), as 

= /f (yf-^)), (23) 

(k) 

where /j is chosen to map uniformly randomly each 
sequence in Ts{Yj) onto Ts{Xj) and is done indepen- 
dently for each block k. Each relay does the encoding 
prescribed by (l23l ). Given the knowledge of all the 

(k) 

encoding functions /j at the relays and signals received 
over i(' + |V| — 2 blocks, the decoder D ^ T>, attempts 
to decode the message W sent by the source. 

B. Proof illustration 

Now, we illustrate the ideas behind the proof of The- 
orem 12.11 for layered networks using the same example 
as in Section IIII-BI which was done for the linear deter- 
ministic model. Since we are dealing with deterministic 
networks, the logic up to (fTOl) in Section IIII-BI remains 
the same. We will again illustrate the ideas using the 
cut 17 = {S,Ai,Bi}. As in Section flll-BI necesary 
condition for this set to be the distinguishability set is 
that YA^iw) = YA^iw'), along with yB,{w) = y ^.K) 
and yj;){w) = y£){w'). Notice that as in Section Illl-Bl 
we are suppressing the block numbers associated with 
the received signals. It is clear that for w = wi, the 
block numbers associated with y^aiyfi, 'Yd are 1,2,3 
respectively. 

Note that since y^ G Ts{Yj) with high probability, 
we can focus only on the typical received signals. 
Let us first examine the probability that y^^, (fw) = 
y^^(i(;'). Since S can distinguish between w,w', it 
maps these sub-messages independently to two transmit 
signals X5(t(;), xs(t(;') G Ts{Xs), hence we can see that 
this probability is, 

W{{^s{w'),yA,{w)) G Ts{Xs,Ya.J} = 2-"(^-^^-^). 

(24) 

Now, in order to analyze the probability that y^^ (w) = 
yBal^O' as seen in the linear model analysis, we see 
that since yA^{w) = yA^i'^')' ^A2{w) = Xyi,(it;'), 
i.e., the same signal is sent under both w,w'. There- 
fore, since naturally {■KA:,{w),yB^{w)) G Ts{Xa2,Yb2), 
obviously, {yLA2{w'),yB^{w)) G Ts{Xa2,Yb2) as well. 
Therefore, under w', we already have :x.a2{w') to be 
jointly typical with the signal that is received un- 
der w. However, since Ai can distinguish between 
w,w', it will map the transmit sequence x^j(w') 
to a sequence which is independent of :x.Ai{w) 
transmitted under w. Since an error occurs when 

{yLAAw'),y^AAw'),y B^i'^)) e Ts{Xa^,Xa2,Yb2), and 
since A2 cannot distinguish between u;', we also have 

y^A^w) = y^A^w'), we require that {y.A^,y^A2,y b^) 



generated like p(xAi)p(xyi2, y^a) behaves like a jointly 
typical sequence. Therefore, this probability is given by, 



where = indicates exponential equality (where we 
neglect subexponential constants), and (a) follows since 
we have generated the mappings fj independently, it 
induces an independent distribution on Xai,Xa2- An- 
other way to see this is that the probability of (|25] ) is 

|r5(X.4jx^,,y^J| , . , , 
given by — \\ — ^ which by using properties 



|T,(X^J| 

of (robustly) typical sequences [11] yields the same 
expression as in (|25] ). Note that the calculation in (l25l) is 
similar to one of the error event calculations in a multiple 
access channel. 

Using a similar logic we can write, 



layered network, as in the linear model case, this transfer 
function breaks up into components corresponding to 
■_ each of the Id layers of the network. More precisely, 
we can create d = Id disjoint sub-networks of nodes 
(25) corresponding to each layer of the network, with /3i{0,) 
nodes at distance / — 1 from S, on one side and 7^(17) 
nodes at distance / from 5, on the other, for / = 
1, . . . Each of this MIMO clusters have a transfer 
function Gi{-),1 = 1, . . . ,Id associated with them. 

As in the linear model, each node i G A(^) sees a 
signal related to w = wi in block /j = / — I, and therefore 
waits to receive this block and then does a mapping using 
the general encoding function given in (1231 ) as 



xj '[w] 



(w)). 



(28) 



Therefore, putting (124] 
get 



The received signals in the nodes j e 7/(^2) are deter- 
ministic transformations of the transmitted signals from 
nodes Ti = {u : {u,v) ^ £,v ^ 7/ ($7)}. As in the 

linear model analysis of Section HVl the dependence is 
) W n)-TI{XB^;YD\XB„) /ofiV 11 1 ■ ■ ■ , A- , 1 ^ 

~ ^ ■ v^^-on all the transmitting signals at distance / — 1 from 

m together as done in (O we *e source, not just the ones in A C 0. Since all the 

receivers in 7/ are at distance / from 5, they form the 
receivers of the MIMO layer and we denote this vector 
received signal as zi{w), and this can be done for all 
layers I = 1, . . . ,Id- Note that as in the example network 
of Section rV-Bl for all the transmitting nodes in T which 
cannot distinguish between w, w' the transmitted signal 
would be the same under both w and w' . Therefore, all 
the nodes in Tif) OP cannot distinguish between w, w' 
and therefore 



P{(xB,K),XB,H,yz)H) e Ts{Xb,,Xb,Yd)} 

n~TI{XB,;YD,XB^) o-TIiXBrXolX 



\XA,)+I{XB,;Yn\XB,)} 



Note that for this example, due to the Markovian struc- 
ture of the network we can see thaj^ HYn"', Xq\Xqc) = 
I{Xs;Ya,) + I{Xa,;Yb,\Xa,) + I{Xb,;Yd\Xb,), 
hence as in ([T5] ) we get that. 



(27) 



and hence the error probability can be made as small 
as desired if i? < miiingAi, H(Yqc\Xqc), since we are 
dealing with deterministic networks. 

C. General deterministic model: Proof for layered net- 
works 

As in the example illustrating the proof in Section 
IV-BI the logic of the proof in the general deterministic 
functions follows that of the linear model quite closely. 
In particular, as in Section |IV] we can define the bi- 
partite network associated with a cut Q.. Instead of a 
transfer matrix Gf2,n<=(-) associated with the cut, we have 
a transfer function Gj^. Since we are still dealing with a 

''Note that though in the encoding scheme there is a de- 
pendence between Xai,Xa:2,Xbi,Xb2 and Xs, in the single- 
letter form of the mutual information, under a product distri- 
bution, Xai , Xa2 , Xbi , Xb2 , Xs are independent of each other. 
Therefore for example, is independent of Xb2 leading to 

H{Yb2\Xa2, XB2) = H{Yb2\Xa2)- Using this argument for the 
cut-set expression I{Yqi: - X(i\Xnc), we get the expansion. 



Hence it is clear that since {{-Kj{w)}j^'rinn'=,'^i{w)) € 
Ts, we have that 



Therefore, just as in Section |V-B[ we see that the 
probability that zi{w) = zi{w'), is given by, 

¥{zi{w) = zi{w')} = 2-^^(^^<"'^'^"^^< (29) 

Since the events in each of the MIMO stages (clusters) 
are independent, we get that 

¥{zi{w)=zi{w'),l = l,...,d} = 

Note that due to the Markovian nature of the layered net- 
work, we see that J2t^H{Zi\XT,nn^) = H{Yn.\Xn^). 
From this point onwards the proof closely follows the 
steps as in the linear model from (l20l ) onwards. Therefore 



for the layered (equal path) network with general deter- 
ministic functions we have proved the following result. 
Similarly in multicast scenario we declare an error if any 
receiver D £ V makes an error, we see that since we 
have 2^^ messages, from the union bound we can drive 
the error probability to zero if we have, 

R< max min min HiYQclXQ,:). (31) 

Therefore we have proved the following result. 

Theorem 5.2: Given a layered (equal path) general 
deterministic relay network (with broadcast and multiple 
access), we can achieve any rate R from S multicasting 
to all destinations D £ V, with R satisfying: 



R< max min min H(Yfi<:\XQc) 
VI. Arbitrary networks 



(32) 



Given the proof for layered networks with equal path 
lengths, we are ready to tackle the proof of Theorem 12. II 
and Theorem 12.21 for general relay networks. 

The ingredients are developed below. First is that 
any network can be unfolded over time to create a 
layered deterministic network (this idea was introduced 
for graphs in [1] to handle cycles in a graph). The 
idea is to unfold the network to K stages such that i- 
th stage is representing what happens in the network 
during (i — 1)T to zT — 1 symbol times. For example in 
figure [TJa) a network with unequal paths from S to D 
is shown. Figure [TJb) shows the unfolded form of this 
network. As we notice each node v £ V is appearing 
at stage 1 < z < /\ as v[i]. There are additional nodes: 
T[i]'s and i2[i]'s. These nodes are just virtual transmitters 
and receivers that are put to buffer and synchronize the 
network. Since all communication links connected to 
these nodes (T[i]'s and i?[i]'s) are modelled as wireline 
links without any capacity limit they would not impose 
any constraint on the network. One should notice that in 
general there must be an infinite capacity link between 
the same node and itself appearing at different times 
however, here we are omitting these links which means 
we limit the nodes to have a finite memory T. Now we 
show the following lemma. 

Lemma 6.1: Assume ^ is a general deterministic net- 

(K) 

work and (/^nf ^ network obtained by unfolding G over 
K time steps (as shown in figure [T|l. Then the following 
communication rate is achievable in Q: 



R<^ max min H{Ync IXq-^ 



(33) 



where the minimum is taken over all cuts Jlunf in qI,!^ . where L = 21^' ^. 



Proof: By unfolding Q we get an acyclic determin- 
istic network such that all the paths from the source to 
the destination have equal length. Therefore by theorem 
15.21 we can achieve the rate 

R^nf< max min H{Yn.\Xn.) (34) 

in the time-expanded graph. Since it takes K steps to 
translate and achievable scheme in the time-expanded 
graph to an achievable scheme in the original graph, then 
the Lemma is proved. ■ 

If we look at different cuts in the time-expanded graph 
we notice that there are two types of cuts. One type 
separates the nodes at different stages identically. An 
example of such a steady cut is drawn with solid line 
in figure [1] (b) which separates {S, A} from {B, D} at 
all stages. Clearly each steady cut in the time-expanded 
graph corresponds to a cut in the original graph and 
moreover its value is K times the value of the corre- 
sponding cut in the original network. However there is 
another type of cut which does not behave identically 
at different stages. An example of such a wiggling cut 
is drawn with dotted line in figure [T] (b). There is no 
correspondence between these cuts and the cuts in the 
original network. 

Now comparing Lemma 16.11 to the main theorem 12.11 
we want to prove, we notice that in this Lemma the 
achievable rate is found by taking the minimum of cut- 
values over all cuts in the time-expanded graph (steady 
and wiggling ones). However in theorem 12711 we want to 
prove that we can achieve a rate by taking the minimum 
of cut-values over only the cuts in the original graph 
or similarly over the steady cuts in the time-expanded 
network. So a natural question is that in a time-expanded 
network does it make any difference if we take the 
minimum of cut-values over only steady cuts rather than 
all cuts ? Quite interestingly we show in the following 
Lemma that asymptotically as K —> oo this difference 
(normalized by 1/K) vanishes. 

Lemma 6.2: Consider a general deterministic net- 
work, Q. Assume a product distribution on {xi}jgv> 
p({xj}igv) = Y\i£vP^^i)- time-expanded 
graph, , assume that for each node i £ V, 
{xi[tWi<t<K are distributed i.i.d. according to p{xi) in 
the original network. Also for any 1 < ti,t2 < K and 
i 7^ j, Xi[ti\ is independent of 3;j[t2]- Then for any cut 
r^unf on the unfolded graph we have, 

(K-L + 1) min H{YnAXn^-) < H{Yn^jXn^J (35) 



B 

(a) An example of general determin- 
istic network 




cx) oo oo oo oo oo D 



(b) Unfolded deterministic network. An example of steady cuts and wiggling cuts are respectively shown by solid and dotted 
lines. 

Fig. 1. An example of a general deterministic network with un equal paths from S to D is shown in (a). The corresponding unfolded 
network is shown in (6). 



Now since for any distribution 

min H{Yn^ \Xn^ ) < K min H{Yn. \Xn^) (36) 

we have an immediate corollary of this lemma 

Corollary 6.3: Assume ^ is a general deterministic 
network and Q^^^^ is a network obtained by unfolding Q 
over K time steps then 

= maxj-[^^^ p(^_) minngAo H{Ync \Xn. ) (37) 

Now by Lemma 16.11 and corollary 16.31 the proof of 
main theorem [2?T] is complete. So we just need to prove 
Lemma |6^ First note that any cut in the unfolded graph, 
^unf, partitions the nodes at each stage 1 < i < K to Ui 
(on the left of the cut) and Vj (on the right of the cut). If 
at one stage S[i] G Vj or D[i] G Ui then the cut passes 
through one of the infinite capacity edges (capacity Kq) 
and hence Lemma 16.21 is obviously proved. Therefore 
without loss of generality assume that S[i] G Ui and 
D[i] G Vi for all 1 < i < K. Now since for each i G V, 



{xi[t]}i<t<K are i.i.d distributed we can writeH 

K-l 

HiYu'jXn^J = E ^(^v.,. \Xv.) (38) 

i=l 

For simplification we define 

i;{Vi,V2) = H{YvjXv,) (39) 

then we have the following lemma, whose proof is in 
the appendix. 

Lemma 6.4: Let Vi, . . . , V/ be / non identical subsets 
of V - {S} such that D G for all 1 < i < /. Also 
assume a product distribution on x^, i G V. Then 

I 

V'(Vi,V2) + ---+^(Vz_i,Vz)+V(V/,Vi) > 

1=1 

(40) 

where for A: = 1, . . . , 

Vfc = U (H, n---nHJ (41) 

{ii,...,ifc}C{l,...,/} 

'As in Section IV-BI under the product distribution the mutual 
information expression of the cut-set breaks into a summation. 



or in another words each Vj is the union of (') sets such 

that each set is intersect of j of V/s. 

A special case of this Lemma was recently stated in 

an independent work in [14] (Lemma 2) in the context 

of erasure networks with only multiple access and no 

broadcast. 

Now we are ready to prove Lemma 16.21 
Proof: (proof of Lemma 16.21 ) We have 

K-l A'-l 
i=l 1=1 

(42) 

Now look at the sequence of Vj's. Note that there are 
total of L = 2l^l~^ possible subsets of V that contain D 
but not S. Assume that is the first set that is revisited. 
Assume that it is revisited at step Vs+/. Therefore by 
Lemma \6A\ we have 

i-i I 

> J^V(Vi,H) (43) 

i=l 1=1 

where Vj's are described in Lemma |6^ Now note that 
any of those Vi contains D but not S and hence it de- 
scribes a cut in the original graph, therefore ^/^(Vj, Vj) > 
minQGAo H{Yn'^\Xnc) and hence 

V V(Vi, V.+i) > / min H{Yn^ \Xn^) (44) 

i=l 

which means that the value of that loop is at least length 
of the loop times the min-cut of the original graph. Now 
since in any L — 1 time frame there is at least one 
loop therefore except at most a path of length L — 1 
everything can be replaced with the value of the min-cut 
in E£t'V'(H,H+i). Therefore, 

K-l 

V ^(Vi,H+i) > (i^-L+l) mi^n H{Yn^\Xn^) (45) 

1=1 

■ 
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appendix 
Proof of Lemma [6~4] 

First we state a few lemmas some of whose proofs are very 
straightforward and hence omitted, 

Lemma 1.1: The V/s defined in Lemma 16741 satisfy. 

V/ C V,_i C . . . C Vi (46) 
Lemma 1.2: Let Vi, . . . , V; be I non identical subsets of 
V -{S} such that D G Vi for all 1 < i < L Also assume that 
Vi , . . . , V/ are as defined in lemma 16.41 Then for any w S V 
we have 

\{i\v e VJI = Mv e v,}| (47) 

Proof: This lemma just states that for each v & V the 
number of times that v appears in V^'s is equal to the number 



of times that v appears in V^'s. To prove it assume that v 
appears in V^'s is n. Then clearly 

v&V,, j = l,...,n (48) 



^V(V„V.) = (57) 



i=l 
I 



Now for any j > n any element that appears in each Vj must _ H(Y~ ) — H{X~ ) (58) 

appear in at least j of Vi's and since v only appears in n of ^' ^' 



Vis therefore, 

V ^ Vj, j >n (49) 

therefore 

|{i|vGV^.}| = |0>ey,}|=n (50) 



i=l j=l 

Now define the set 

w, = {rv,,^v,_J, z = i,...,; (59) 

where Vq = V; . Since by lemma 11.21 we have 



■ ^F(Xv„)=^i/(X^J (60) 

Lemma 1.3: Let Vi, . . . ,Vi be Z non identical subsets of i=i i=i 

V - {5} such that D e Vi for all 1 < i < /. Also assume a j^^^j j^^^^ prove that 
product distribution on Xi, i E V. Then 

H{Xv,) + --- + H{Xv^)=H{X^;) + --- + H{X^) (51) > ^ ^(^v.. ^v.) (61) 



where H's are defined in Lemma O and is just the Now by since entropy is a submodular function by Lemma 

binary entropy function. [r4] (k-way submodularity) we have. 

Proof: For any v eV define ^ ^ 

n^ = \{AvEV}\ (52) ^i/(W.)>5:i/(W.) (62) 

i=l i=l 

and where 

= ^^^^ U (>V,n...nW,:J, r^l,...,/ (63) 

Now since , i G are independent of each other we have 

H{Xv,) + --- + H{Xv,) = Y^ n,H{X,) (54) 

and 



{ti,....i,-}C{l, ....(} 

Now for any r (1 < r < we have 



Wr = U (w,, n • ■ • n J 

{n,...,j,}c{i,...j} 



By lemma nM we know that 7i„ = fi„ for all w G F hence the _ i j ({^v v x 

lemma is proved. ■ ^ ^'^'^ nv,^: V(jj„i)n nXy^^ 



The following Lemma is just a straight forward general- 
ization of submodularity to more than two sets (see also [8], 

Theorem 5 where this result is applied to the entropy function _ X } 

which is submodular). 

Lemma 1.4: Let Vi, . . . , Vfc be a collection of sets. Assume Therefore by equation (|62]i we have, 
that ^( j is a submodular function. Then, ; ; 



{n,...,j,}c{i....j} 

{^Uoi .,}(v,in---nv,j,^U{.i,...,,,}(V(.i-i,n---nV(,, 



C(Vi) + • • • + e(Vfe) > e(Vi) + • • • + ^(Vfe) (56) 
where V/s are defined in Lemma \6A 



Y,H{W,) > ^i?(WO (64) 



i=l j=l 



lere Vi s are aennea m Lemma lo.4l _ frfv- X- ^ rfiS^i 

Now we are ready to prove Lemma |64l First note that ~ ( v, > ^Vi/ ^^•'> 



i=l 



i/'CVi, Va) + ■ ■ ■ + tZ-CVi^i, V,) + ^(V,, Vi) = 

I 

i=l 

and 



Hence the Lemma is proved. 



